Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitive to noise points. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change points in data streams with the tolerance of noise points. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
translated by 谷歌翻译
We study the composition style in deep image matting, a notion that characterizes a data generation flow on how to exploit limited foregrounds and random backgrounds to form a training dataset. Prior art executes this flow in a completely random manner by simply going through the foreground pool or by optionally combining two foregrounds before foreground-background composition. In this work, we first show that naive foreground combination can be problematic and therefore derive an alternative formulation to reasonably combine foregrounds. Our second contribution is an observation that matting performance can benefit from a certain occurrence frequency of combined foregrounds and their associated source foregrounds during training. Inspired by this, we introduce a novel composition style that binds the source and combined foregrounds in a definite triplet. In addition, we also find that different orders of foreground combination lead to different foreground patterns, which further inspires a quadruplet-based composition style. Results under controlled experiments on four matting baselines show that our composition styles outperform existing ones and invite consistent performance improvement on both composited and real-world datasets. Code is available at: https://github.com/coconuthust/composition_styles
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Accurate activity location prediction is a crucial component of many mobility applications and is particularly required to develop personalized, sustainable transportation systems. Despite the widespread adoption of deep learning models, next location prediction models lack a comprehensive discussion and integration of mobility-related spatio-temporal contexts. Here, we utilize a multi-head self-attentional (MHSA) neural network that learns location transition patterns from historical location visits, their visit time and activity duration, as well as their surrounding land use functions, to infer an individual's next location. Specifically, we adopt point-of-interest data and latent Dirichlet allocation for representing locations' land use contexts at multiple spatial scales, generate embedding vectors of the spatio-temporal features, and learn to predict the next location with an MHSA network. Through experiments on two large-scale GNSS tracking datasets, we demonstrate that the proposed model outperforms other state-of-the-art prediction models, and reveal the contribution of various spatio-temporal contexts to the model's performance. Moreover, we find that the model trained on population data achieves higher prediction performance with fewer parameters than individual-level models due to learning from collective movement patterns. We also reveal mobility conducted in the recent past and one week before has the largest influence on the current prediction, showing that learning from a subset of the historical mobility is sufficient to obtain an accurate location prediction result. We believe that the proposed model is vital for context-aware mobility prediction. The gained insights will help to understand location prediction models and promote their implementation for mobility applications.
translated by 谷歌翻译
从单眼图像中学习的自我监督深度学习通常依赖于暂时相邻图像帧之间的2D像素光度关系。但是,他们既没有完全利用3D点的几何对应关系,也没有有效地应对闭塞或照明不一致引起的光度扭曲中的歧义。为了解决这些问题,这项工作提出了密度量构建网络(DEVNET),这是一种新型的自我监管的单眼深度学习框架,可以考虑3D空间信息,并利用相邻的相机flustums中的更强的几何约束。我们的DEVNET不是直接从单个图像中回归像素值,而是将摄像头划分为多个平行的平面,并预测每个平面上的点闭塞概率密度。最终的深度图是通过沿相应射线集成密度来生成的。在训练过程中,引入了新颖的正则化策略和损失功能,以减轻光度歧义和过度拟合。如果没有明显放大的模型参数的大小或运行时间,DEVNET在Kitti-2015室外数据集和NYU-V2室内数据集上均优于几个代表性基准。特别是,在深度估计的任务中,在Kitti-2015和NYU-V2上,DEVNET均减少了4%的根平方。代码可在https://github.com/gitkaichenzhou/devnet上找到。
translated by 谷歌翻译
在各种图形相关的任务中出现了计算两个图之间的距离/相似性的图形相似性测量。最近的基于学习的方法缺乏可解释性,因为它们直接将两个图之间的交互信息转换为一个隐藏的向量,然后将其映射到相似性。为了解决这个问题,这项研究提出了图形相似性学习的端到端更容易解释的范式,并通过最大的常见子图推理(INFMC)命名相似性计算。我们对INFMCS的关键见解是相似性评分与最大公共子图(MCS)之间的牢固相关性。我们隐含地推断MC获得标准化的MCS大小,其监督信息仅在训练过程中的相似性得分。为了捕获更多的全局信息,我们还使用图形卷积层堆叠一些香草变压器编码层,并提出一种新颖的置换不变的节点位置编码。整个模型非常简单却有效。全面的实验表明,INFMC始终优于用于图形分类和回归任务的最先进基线。消融实验验证了提出的计算范式和其他组件的有效性。同样,结果的可视化和统计数据揭示了INFMC的解释性。
translated by 谷歌翻译
目前,基于端到端深度学习的开放域对话系统仍然是黑匣子模型,使其易于与数据驱动的模型生成无关的内容。具体而言,由于缺乏指导培训的先验知识,潜在变量在潜在空间中与不同的语义纠缠在一起。为了解决这个问题,本文提议通过涉及介绍量表特征分离的认知方法来利用生成模型。特别是,该模型将宏观指导类别知识和微观级别的开放域对话数据集成到培训中,并将先验知识利用到潜在空间中,从而使模型能够将潜在变量置于介镜范围内的潜在变量。此外,我们为开放域对话提出了一个新的指标,可以客观地评估潜在空间分布的解释性。最后,我们在不同的数据集上验证了我们的模型,并在实验上证明我们的模型能够比其他模型产生更高的质量和更容易解释的对话。
translated by 谷歌翻译
由于传统经验风险最小化(ERM)的概括性差,因此在分布转移的情况下,分布(OOD)概括算法受到越来越多的关注。但是,OOD的概括算法忽略了训练数据质量的巨大差异,这极大地损害了这些方法的准确性。在本文中,我们从理论上揭示了训练数据质量和算法性能之间的关系,并分析了Lipschitz正则不变风险最小化的最佳正则化方案。提出了一种基于理论结果提出的新算法,以减轻样品水平和域水平上低质量数据的影响。关于回归和分类基准的实验验证了我们方法具有统计学意义的有效性。
translated by 谷歌翻译
The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability. To tackle the above issue, we propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows. In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance. Since no extra heavy relation modeling module is needed and the implementation is highly parallelized, the proposed tracker runs at a fast speed. To further improve the inference efficiency, an in-network candidate early elimination module is proposed based on the strong similarity prior calculated in the one-stream framework. As a unified framework, OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k, i.e., achieving 73.7% AO, improving the existing best result (SwinTrack) by 4.3\%. Besides, our method maintains a good performance-speed trade-off and shows faster convergence. The code and models are available at https://github.com/botaoye/OSTrack.
translated by 谷歌翻译
自主驾驶的当代深度学习对象检测方法通常会假定前缀类别的共同交通参与者,例如行人和汽车。大多数现有的探测器无法检测到罕见的物体和拐角案例(例如,越过街道的狗),这可能会导致某些情况下发生严重的事故,从而使真实世界应用可靠的自动驾驶不确定。阻碍了真正可靠的自动驾驶系统发展的主要原因是缺乏评估对象探测器在角案例上的性能的公共数据集。因此,我们介绍了一个名为CODA的具有挑战性的数据集,该数据集揭示了基于视力的检测器的关键问题。该数据集由1500个精心选择的现实世界驾驶场景组成,每个场景平均包含四个对象级角案例(平均),涵盖30多个对象类别。在CODA上,在大型自动驾驶数据集中训练的标准对象探测器的性能显着下降到3月的12.8%。此外,我们试验了最新的开放世界对象检测器,发现它也无法可靠地识别尾声中的新对象,这表明对自主驾驶的强大感知系统可能远离触及。我们希望我们的CODA数据集有助于对现实世界自动驾驶的可靠检测进行进一步的研究。我们的数据集将在https://coda-dataset.github.io上发布。
translated by 谷歌翻译